92 research outputs found

    Prediction of scientific collaborations through multiplex interaction networks

    Get PDF
    Link prediction algorithms can help to understand the structure and dynamics of scientific collaborations and the evolution of Science. However, available algorithms based on similarity between nodes of collaboration networks are bounded by the limited amount of links present in these networks. In this work, we reduce the latter intrinsic limitation by generalizing the Adamic-Adar method to multiplex networks composed by an arbitrary number of layers, that encode diverse forms of scientific interactions. We show that the new metric outperforms other single-layered, similarity-based scores and that scientific credit, represented by citations, and common interests, measured by the usage of common keywords, can be predictive of new collaborations. Our work paves the way for a deeper understanding of the dynamics driving scientific collaborations, and provides a new algorithm for link prediction in multiplex networks that can be applied to a plethora of systems

    Towards a data-driven characterization of behavioral changes induced by the seasonal flu

    Get PDF
    In this work, we aim to determine the main factors driving self-initiated behavioral changes during the seasonal flu. To this end, we designed and deployed a questionnaire via Influweb, a Web platform for participatory surveillance in Italy, during the 2017 − 18 and 2018 − 19 seasons. We collected 599 surveys completed by 434 users. The data provide socio-demographic information, level of concerns about the flu, past experience with illnesses, and the type of behavioral changes voluntarily implemented by each participant. We describe each response with a set of features and divide them in three target categories. These describe those that report i) no (26%), ii) only moderately (36%), iii) significant (38%) changes in behaviors. In these settings, we adopt machine learning algorithms to investigate the extent to which target variables can be predicted by looking only at the set of features. Notably, 66% of the samples in the category describing more significant changes in behaviors are correctly classified through Gradient Boosted Trees. Furthermore, we investigate the importance of each feature in the classification task and uncover complex relationships between individuals’ characteristics and their attitude towards behavioral change. We find that intensity, recency of past illnesses, perceived susceptibility to and perceived severity of an infection are the most significant features in the classification task and are associated to significant changes in behaviors. Overall, the research contributes to the small set of empirical studies devoted to the data-driven characterization of behavioral changes induced by infectious disease

    Monitoring Gender Gaps via LinkedIn Advertising Estimates: the case study of Italy

    Full text link
    Women remain underrepresented in the labour market. Although significant advancements are being made to increase female participation in the workforce, the gender gap is still far from being bridged. We contribute to the growing literature on gender inequalities in the labour market, evaluating the potential of the LinkedIn estimates to monitor the evolution of the gender gaps sustainably, complementing the official data sources. In particular, assessing the labour market patterns at a subnational level in Italy. Our findings show that the LinkedIn estimates accurately capture the gender disparities in Italy regarding sociodemographic attributes such as gender, age, geographic location, seniority, and industry category. At the same time, we assess data biases such as the digitalisation gap, which impacts the representativity of the workforce in an imbalanced manner, confirming that women are under-represented in Southern Italy. Additionally to confirming the gender disparities to the official census, LinkedIn estimates are a valuable tool to provide dynamic insights; we showed an immigration flow of highly skilled women, predominantly from the South. Digital surveillance of gender inequalities with detailed and timely data is particularly significant to enable policymakers to tailor impactful campaigns.Comment: 10 page

    Developing Real Estate Automated Valuation Models by Learning from Heterogeneous Data Sources

    Get PDF
    In this paper we propose a data acquisition methodology, and a Machine Learning solution for the partially automated evaluation of real estate properties. The novelty and importance of the approach lies in two aspects: (1) when compared to Automated Valuation Models (AVMs) as available to real estate operators, it is highly adaptive and non-parametric, and integrates diverse data sources; (2) when compared to Machine Learning literature that has addressed real estate applications, it is more directly linked to the actual business processes of appraisal companies: in this context prices that are advertised online are normally not the most relevant source of information, while an appraisal document must be proposed by an expert and approved by a validator, possibly with the help of technological tools. We describe a case study using a set of 7988 appraisal documents for residential properties in Turin, Italy. Open data were also used, including location, nearby points of interest, comparable property prices, and the Italian revenue service area code. The observed mean error as measured on an independent test set was around 21 K€, for an average property value of about 190 K€. The AVM described here can help the stakeholders in this process (experts, appraisal company) to provide a reference price to be used by the expert, to allow the appraisal company to validate their evaluations in a faster and cheaper way, to help the expert in listing a set of comparable properties, that need to be included in the appraisal document

    News and the city: understanding online press consumption patterns through mobile data

    Get PDF
    The always increasing mobile connectivity affects every aspect of our daily lives, including how and when we keep ourselves informed and consult news media. By studying a DPI (deep packet inspection) dataset, provided by one of the major Chilean telecommunication companies, we investigate how different cohorts of the population of Santiago De Chile consume news media content through their smartphones. We find that some socio-demographic attributes are highly associated to specific news media consumption patterns. In particular, education and age play a significant role in shaping the consumers behaviour even in the digital context, in agreement with a large body of literature on off-line media distribution channels

    The Impact of Disinformation on a Controversial Debate on Social Media

    Get PDF
    In this work we study how pervasive is the presence of disinformation in the Italian debate around immigration on Twitter and the role of automated accounts in the diffusion of such content. By characterising the Twitter users with an \textit{Untrustworthiness} score, that tells us how frequently they engage with disinformation content, we are able to see that such bad information consumption habits are not equally distributed across the users; adopting a network analysis approach, we can identify communities characterised by a very high presence of users that frequently share content from unreliable news sources. Within this context, social bots tend to inject in the network more malicious content, that often remains confined in a limited number of clusters; instead, they target reliable content in order to diversify their reach. The evidence we gather suggests that, at least in this particular case study, there is a strong interplay between social bots and users engaging with unreliable content, influencing the diffusion of the latter across the network

    Developing Real Estate Automated Valuation Models by Learning from Heterogeneous Data Sources

    Get PDF
    In this paper we propose a data acquisition methodology, and a Machine Learning solution for the partially automated evaluation of real estate properties. The novelty and importance of the approach lies in two aspects: (1) when compared to Automated Valuation Models (AVMs) as available to real estate operators, it is highly adaptive and non-parametric, and integrates diverse data sources; (2) when compared to Machine Learning literature that has addressed real estate applications, it is more directly linked to the actual business processes of appraisal companies: in this context prices that are advertised online are normally not the most relevant source of information, while an appraisal document must be proposed by an expert and approved by a validator, possibly with the help of technological tools. We describe a case study using a set of 7988 appraisal documents for residential properties in Turin, Italy. Open data were also used, including location, nearby points of interest, comparable property prices, and the Italian revenue service area code. The observed mean error as measured on an independent test set was around 21 K€, for an average property value of about 190 K€. The AVM described here can help the stakeholders in this process (experts, appraisal company) to provide a reference price to be used by the expert, to allow the appraisal company to validate their evaluations in a faster and cheaper way, to help the expert in listing a set of comparable properties, that need to be included in the appraisal document

    Immigration as a Divisive Topic: Clusters and Content Diffusion in the Italian Twitter Debate

    Get PDF
    In this work, we apply network science to analyse almost 6 M tweets about the debate around immigration in Italy, collected between 2018 and 2019, when many related events captured media outlets’ attention. Our aim was to better understand the dynamics underlying the interactions on social media on such a delicate and divisive topic, which are the actors that are leading the discussion, and whose messages have the highest chance to reach out the majority of the accounts that are following the debate. The debate on Twitter is represented with networks; we provide a characterisation of the main clusters by looking at the highest in-degree nodes in each one and by analysing the text of the tweets of all the users. We find a strongly segregated network which shows an explicit interplay with the Italian political and social landscape, that however seems to be disconnected from the actual geographical distribution and relocation of migrants. In addition, quite surprisingly, the influencers and political leaders that apparently lead the debate, do not necessarily belong to the clusters that include the majority of nodes: we find evidence of the existence of a `silent majority’ that is more connected to accounts who expose a more positive stance toward migrants, while leaders whose stance is negative attract apparently more attention. Finally, we see that the community structure clearly affects the diffusion of content (URLs) by identifying the presence of both local and global trends of diffusion, and that communities tend to display segregation regardless of their political and cultural background. In particular, we observe that messages that spread widely in the two largest clusters, whose most popular members are also notoriously at the opposite sides of the political spectrum, have a very low chance to get visibility into other clusters
    • …
    corecore